A Short Text Classification Algorithm Based on Semantic Extension
نویسندگان
چکیده
A semantic-extension-based algorithm for short texts is proposed, by involving the Word2vec and LDA model, to improve performance of classification, which frequently deteriorated semantic dependencies scarcity features. For every keyword within a text, weighted synonyms related words can be generated Word2Vec respectively, subsequently inserted extend text reasonable length. We not only have established criterion means similarity estimation determine whether sentence should extended, we designed scheme choose number extended words. The will classified. Experimental results show that, classification proposed algorithm, in terms precision rate, approximately 5% higher than that TF-IDF model 10% VSM method.
منابع مشابه
Short Text Classification Based on Improved ITC
The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selec...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملA Novel Multiclass Text Classification Algorithm Based on Multiconlitron
A novel multiclass text classification algorithm based on multiconlitron is proposed. The multiconlitron is constructed for each possible pair of classes in sample space, each of which is used to separate two classes. For the sample to be classified, every multiconlitron is used to judge its classman vote for the corresponding class. The final class of the sample is determined by the number of ...
متن کاملLanguage independent semantic kernels for short-text classification
Short-text classification is increasingly used in a wide range of applications. However, it still remains a challenging problem due to the insufficient nature of word occurrences in short-text documents, although some recently developed methods which exploit syntactic or semantic information have enhanced performance in short-text classification. The language-dependency problem, however, caused...
متن کاملArabic Semantic Text Classification Based on Wavelet Spectral Analysis
We propose in this paper a new document representation in Text Mining based on signal representation and spectral processing by Wavelets Transform. Our method gives a solution of syntactic and semantic descriptor dependency problem, without deleting information. This can be done by grouping dependent descriptors in clusters with a single representative. Thereafter each class is represented by a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Chinese Journal of Electronics
سال: 2021
ISSN: ['1022-4653', '2075-5597']
DOI: https://doi.org/10.1049/cje.2020.11.014